Quantitative Corpus Linguistics with R
CH2 三個重要方法
corpus 的定義
the notion of “corpus” refers to a machine-readable collection of (spoken or written) texts that were produced in a natural communicative setting, and the collection of texts is compiled with the intention (I) to be representative and balanced with respe<:t to a particular linguistic variety or register or genre and (2) to be analyzed linguistically.
各種 corpora
general vs. specific
raw vs. annotated
幾種annoation:
corpora能提供什麼資訊
簡單來說就是頻率:
- 各種pattern的出現頻率
- pattern之間的共同出現頻率
Frequency
詞的頻率。
但什麼是詞?不同語言中的詞。
type vs. token
從頻率能看出些什麼 (p. 14)
Collocation
Lexical Co-occurence
三種co-occurence
- collocation (本篇重點)
- colligation
- collostruction
應用
- 語言教學
- 語義學
Concordance
(Lexico-)Grammatical Co-occurence
相較於collocation,concordance關注的是一個詞所處於的更大的脈絡。